Back-off as Parameter Estimation for DOP models

نویسنده

  • Luciano Grüdtner Buratto
چکیده

Data-Oriented Parsing (DOP) is a probabilistic performance approach to parsing natural language. Several DOP models have been proposed since it was introduced by Scha (1990), achieving promising results. One important feature of these models is the probability estimation procedure. Two major estimators have been put forward: Bod (1993) uses a relative frequency estimator; Bonnema (1999) adds a rescaling factor to correct for tree size effects. Both estimators, however, present biases. Moreover, Bod’s estimator has been shown to be inconsistent (Johnson, 2002), meaning that the probability estimates hypothesized by the model do not approach the true probabilities that generated the data as the sample size grows. In this thesis, we implement a new estimation procedure that tackles the shortcomings of the two previous methods. The main idea is to treat derivation events not as disjoint, but as interrelated in a hierarchical cascade of parse tree derivations. We show that this new estimator – called the Back-Off DOP (BO-DOP) estimator – outperforms both previous models. We tested it on the OVIS treebank, a Dutch language, speech-based system, and report error reductions of up to 11.4% and 15% when compared to, respectively, Bod’s and Bonnema’s estimators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structured Parameter Estimation for LFG-DOP using Backoff

Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structur...

متن کامل

Backoff DOP: Parameter Estimation by Backoff

The Data Oriented Parsing (DOP) model currently achieves state-ofthe-art parsing on benchmark corpora. However, existing DOP parameter estimation methods are known to be biased, and ad hoc adjustments are needed in order to reduce the effects of these biases on performance. This paper presents a novel estimation procedure that exploits a unique property of DOP: different derivations can generat...

متن کامل

Parameter Estimation in Spatial Generalized Linear Mixed Models with Skew Gaussian Random Effects using Laplace Approximation

 Spatial generalized linear mixed models are used commonly for modelling non-Gaussian discrete spatial responses. We present an algorithm for parameter estimation of the models using Laplace approximation of likelihood function. In these models, the spatial correlation structure of data is carried out by random effects or latent variables. In most spatial analysis, it is assumed that rando...

متن کامل

Estimation of coal swelling index based on chemical properties of coal using artificial neural networks

Free swelling index (FSI) is an important parameter for cokeability and combustion of coals. In this research, the effects of chemical properties of coals on the coal free swelling index were studied by artificial neural network methods. The artificial neural networks (ANNs) method was used for 200 datasets to estimate the free swelling index value. In this investigation, ten input parameters ...

متن کامل

Using neural network to estimate weibull parameters

As is well known, estimating parameters of the tree-parameter weibull distribution is a complicated task and sometimes contentious area with several methods vying for recognition. Weibull distribution involves in reliability studies frequently and has many applications in engineering. However estimating the parameters of Weibull distribution is crucial in classical ways. This distribution has t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002